Flying Yellow Elephant: Predictable and Efficient MapReduce in the Cloud
نویسندگان
چکیده
Today, growing datasets require new technologies as standard technologies — such as parallel DBMSs — do not easily scale to such level. On the one side, there is the MapReduce paradigm allowing non-expert users to easily define large distributed jobs. On the other side, there is Cloud Computing providing a pay-as-you-go infrastructure for such computations. This PhD project aims at improving the combination of both technologies, especially for the following issues: (i) predictability of performance, (ii) runtime optimization and (iii) Cloud-aware scheduling. These issues can result in significant runtime overhead or non-optimal use of computing resources, which in a Cloud setting directly correlates to high monetary cost. We present preliminary results that confirm a significant improvement on performance when addressing some of these issues. Further, we discuss research challenges and initial ideas for above mentioned issues.
منابع مشابه
Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملOnly Aggressive Elephants are Fast Elephants
Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider’s orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We ...
متن کاملHadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)
MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop — an open-sourc...
متن کاملEfficient Entity Maching over Multiple Data Sources with MapReduce
The execution of data-intensive tasks such as entity matching on large data sources has become a common demand in the era of Big Data. To face this challenge, cloud computing has proven to be a powerful ally to efficient parallel the execution of such tasks. In this work we investigate how to efficiently perform entity matching over multiple large data sources using the MapReduce programming mo...
متن کاملAn Investigation on Scheduling Policies for Cloud-based Software Systems
Background: The rapid diffusion of cloud computing technology has been a focus of interest for enterprises due to its higher scalability and availability and greater elasticity. Nevertheless the limited scheduling mechanisms for running applications in the cloud have been a major challenge. Aim: This project introduces an effective scheduling algorithm, which attempts to maximize cloud resource...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010